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I .  INTRODUCTION 


Among  approaches  to  the  estimation  of  signals,  or  signal  para¬ 
meters,  in  the  presence  of  noise,  the  following  can  be  distinguished: 
a.  Approaches  in  which  a-priori  statistics  are  not  considered  to  be 
associated  with  the  parameters  to  be  estimated,  versus  approaches  in 
which  a-priori  statistics  are  associated  with  the  unknown  parameters. 

Typical  methods  utilized  in  connection  with  the  former  approach 

(1  2)  (3  A) 

are  maximum  likelihood,  ’  minimum  variance  unbiased,  ’  and  use 

(3  4) 

of  Cramer-Rao,  Barankin,  or  other  lower  bounds.  * 

In  cases  where  a-priori  statistics  are  associated  with  the  un¬ 
known  parameters,  typical  approaches  involve  a-posteriori  probability 

(6)  * 

and  decision  theoretic  or  Bayes  optimum  methods. 


One  of  the  reviewers  has  pointed  out  that  the  case  where,  in 
the  above  terminology,  a-priori  statistics  are  not  associated  with 
the  parameters,  can  be  interpreted  in  the  sense  that  a-priorl  statis¬ 
tics  are  Implicitly  assumed  having  approximately  uniform  probability 
density  over  a  very  wide  range;  and  in  fact  he  considers  such  an  in¬ 
terpretation  to  be  a  necessity  in  relating  statistical  criteria  to 
the  real  world.  On  the  other  hand,  the  data  smoothing  methods  to  be 
treated  herein  have  been  widely  used  often  without  any  such  inter¬ 
pretation  being  made  by  the  user. 

For  purposes  of  this  paper,  which  largely  concerns  the  various 
mathematically  equivalent  forms  which  can  be  taken  by  maximum  likeli¬ 
hood,  maximum  a-posteriori,  and  least  squares  estimation  procedures, 
the  author  believes  this  issue  to  be  of  secondary  importance,  and 
prefers  to  continue  using  a  terminology  according  to  which  a-priori 
statistics  may  or  may  not  be  associated  with  given  parameters. 

The  reader  is  free  to  interpret  all  statements  in  the  following, 
to  the  effect  that  "no  a-priori  statistics"  are  associated  with  para¬ 
meters,  to  be  a  short  way  of  saying  that  a-priori  statistics  are 
assumed  which  have  uniform  density  over  a  vide  range. 


(5) 
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b.  Approaches  in  which  the  signals  are  regarded  as  deterministic 

except  for  some  set  of  initially  unknown  parameters,  typical  of  which 

(1  2) 

are  most  treatments  of  the  estimation  of  radar  waveform  parameters,  * 

versus  approaches  in  which  the  signals  are  regarded  as  random  processes, 

(7  8) 

as  in  Wiener  filtering  theory  and  its  generalizations.  ’ 

One  may  also  distinguish  cases  in  which  the  number  of  initially 
unknown  parameters  is  finite  (as  in  most  radar  waveform  estimation 
applications),  and  those  in  which  there  is  an  infinite  set  of  initially 
unknown  parameters  (examples  of  which  will  be  given  in  this  paper). 

The  case  in  which  the  signals  are  regarded  as  random  processes 
can  be  considered  equivalent  to  that  in  which  there  is  an  infinite 
set  of  unknown  parameters  (since  the  signal  process  can  be  represented 
in  terms  of  such  a  set)  and  in  which  a-priori  statistics  are  associated 
with  these  parameters.  In  cases  where  there  are  only  a  finite  number 
of  unknown  parameters,  but  a-priori  statistics  are  associated  with 
them,  the  signal  can  also  be  regarded  as  a  random  process,  albeit 
one  whose  sample  space  is  finite  dimensional. 

Section  II  of  this  paper  is  devoted  to  treating  the  maximum 
likelihood,  maximum  a-posteriorl ,  and  least  squares  approaches  to 
these  various  cases  on  a  unified  basis.  In  Section  II.  1,  it  is  shown 
that  the  Maximum-A-?ostsriorl  (MAP)  estimate  for  the  case  where  a-prlori 
statistics  are  associated  with  the  unknown  parameters  is  under  certain 
conditions  equivalent  to  a  Maxiroiim  Likelihood  (ML)  estimate  in  an 
equivalent  problem  in  which  the  a-prlori  statistics  associated  with 
the  parameters  are  regarded  as  providing  additional  equivalent 
observations  of  the  parameters  (this  is  not  the  same  as  the  well 
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known  fact  that  the  ML  estimate  is  equal  to  the  MA.P  estimate  if  the 
a-priori  pdf's  of  the  parameters  are  uniform,  although  the  latter  is 
a  special  case) . 

We  also  define  mixed  ML-MAP  estimates  for  cases  in  which  some 
but  not  all  of  the  parameters  are  assumed  to  have  a-priori  statistical 
distributions;  equivalent  formulations  are  then  given  where  these  same 
estimates  appear  as  pure  ML  estimates  or  alternatively  as  pure  MAP 
estimates . 

In  Section  II.  2,  attention  is  turned  to  the  main  subject  matter 
of  this  paper;  the  class  of  generalized  least  squares  estimation  pro¬ 
cedures,  which  give  the  ML-MAP  estimates  If  the  statistics  are  Gaussian, 
but  which  remain  good  estimates  even  for  non-Causslan  statistics. 

These  are  formulated  for  the  case  where  a-priorl  statistics  associated 
with  the  unknown  parameters,  or  with  some  subset  thereof,  are  regarded 
as  providing  the  equivalent  of  additional  observations,  and  a  first 
order  error  analysis  is  given,  from  which  the  estimation  error  sta¬ 
tistics  are  then  derived.  A  result  is  also  proved  according  to  which 
parameters  having  a-priori  statistics  can  be  under  certain  conditions 
considered  equivalent  to  additional  noise,  insofar  as  concerns  esti¬ 
mation  of  other  parameters. 

The  foregoing  analysis  is  then  applied  in  Section  II.  3  to 
linear  minimum  mean  square  error  filtering  theory,  which,  as  la  shown, 
can  be  regarded  as  an  application  of  parameter  estimation  with  an 
infinite  mrniber  of  parameters,  with  a-priori  statistics  associated 
with  them,  the  a-priori  statistics  in  turn  being  regarded  as  providing 
the  equivalent  of  additional  observational  data. 
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In  Section  II.  3  we  also  apply  the  results  of  Section  II.  2  to 
prove  that  additive  noise  may  itself  be  considered  to  contribute  an 
additional  infinity  of  parameters  to  be  estimated  jointly  with  the 
signal  parameters  (the  noise  statistics  being  considered  to  contribute 
an  infinite  number  of  additional  equivalent  "observations”).  The 
signal  parameter  estimates  are  shown  to  be  in  certain  cases  the  same, 
whether  the  noise  is  considered  as  noise  or  as  an  additional  number 
of  parameters  to  be  jointly  estimated  with  the  signal  parameters- 

Section  III  is  devoted  to  the  application  of  the  results  of 
Section  II  to  the  problem  of  ob  aining  recursive  solutions  to  certain 
very  general  forms  of  the  signal  estimation  problem.  These  recursive 
solutions  are  of  the  type  defined  and  analyzed  by  Swerling^^’ and 
further  investigated  by  Kalman, Kalman  and  Bucy,^^^^  Blum, 
and  others. 

Recursive  solutions  are  derived  in  Section  III.  2  for  the  case 
where  the  observed  signal  consists  of  the  sum  of  K  random  processes 
called  "signal"  processes  added  to  another  random  process  called  the 
"noise"  process.  The  only  assumption  made  regarding  the  signal 
processes  Is  that  they  be  continuous  In  the  mean;  the  only  assumption 
regarding  the  noise  process  Is  that  It  consist  of  the  sum  of  two 
components,  one  of  which  Is  continuous  In  the  mean,  the  other  being 
white  noise. 

The  problem  Is  reduced  to  the  pure  white  noise  case  by  regarding 
the  non-white  noise  component  as  an  additional  process  to  be  estimated, 
and  further  regarding  all  the  signal  processes,  as  well  as  the  non- 
white  component  of  the  noise  process,  as  equivalent  to  an  Infinite 


number  of  parameters  to  be  jointly  estimated.  The  recursive  techniques 

(9) 

defined  by  Swerling  are  then  applied  directly. 

The  result  is  a  set  of  simultaneous,  non-linear  partial  differential 
equations,  the  solution  to  which  gives  the  desired  recursive  solution 
to  the  optimum  linear  filtering  problem.  However,  this  is  not  as  bad 
as  it  sounds,  since  the  recursive  solution  to  the  filtering  problem 
results  directly  from,  and  is  in  fact  identical  with,  the  process  of 
building  up  the  solutions  to  the  set  of  partial  differential  equations 
from  the  initial  conditions. 

Section  III.  3  is  devoted  to  extending  these  results  to  cases 
where  some  components  of  the  noise  may  be  non-additive. 

Section  IV  contains  some  discussion  of  additional  problems  and 
applications  suggested  by  the  results  of  previous  sections. 

There  is  also  an  appendix  which  treats  an  example  of  an  estimation 
problem  involving  an  infinite  number  of  unknown  parameters;  in  one 
version,  the  problem  can  be  solved  without  associating  a-priori 
statistics  with  the  unknown  parameters,  while  a  slightly  modified 
version  cannot  be  properly  formulated  or  solved  without  associating 
a-priorl  statistics  with  the  unknown  parameters. 

This  paper  is  not  completely  self-contained,  since  heavy 
reliance  is  placed  on  the  results  of  References  2  and  9.  The  most 
important  formulas  required  are  reproduced,  but  the  discussion  of  a 
number  of  points  is  abbreviated,  with  reference  being  made  to  further 
discussion  in  the  cited  papers. 


II.  ML,  MAP,  and  Generalized  Least  Squares  Estimates 


II.  1.  Maximum  Likelihood  and  Maximum  A-Posteriorl  Estimates 

Let  S  represent  a  set  of  observational  data,  and  let  x  represent 
a  parameter  (possibly  multiply,  or  even  infinite,  dimensional)  upon 
which  the  probability  distribution  of  S  depends.  Suppose  that  x  has 
an  a-priori  probability  density  function  p(x)  associated  v/ith  it. 

We  will  suppose  that  the  joint  probability  density  function  of  S  and 
X  exists  and  is  denoted  p(S,x).  The  existence  of  the  conditional 
probability  densities  p(S|x)  and  p(x|s)is  also  assumed.  Then,  the 
maximum  a-posteriori  (MAP)  estimate  x  of  x  is  that  value  of  x  which 
maximizes  p(x|S).  (In  the  following,  the  symbol  S  will  be  used  to 
denote  both  the  observed  value  of  a  random  variable  and  the  argument 
of  various  pdf's;  also,  the  sympol  p  will  be  used  to  denote  a  variety 
of  pdf's  with,  hopefully,  confusion  avoided  by  the  fact  that  the 
argument  of  p  will  indicate  which  pdf  we  are  talking  about.) 

Now  suppose  that  the  a-priori  pdf  of  x,  p(x),  is  symmetrical 
about  some  point  x: 

p(x)  =  f(x  -  x)  =  f(-x  +  x)  (1) 

(If  p(x)  is  also  unimodal,  then  x  would  be  the  MAP  estimate  of  x 
based  on  just  the  a-priori  information.) 

The  following  equivalent  estimation  problem  can  then  be 
formulated; 

X  will  be  regarded  as  the  observed  value  of  an  additional 

fel 

"equivalent"  or  "virtual"  observation  S  .  This  additional 

Throughout  the  paper,  pdf's  are  defined  with  respect  to  Lebesgue 
measure  in  finite-dimensional  sample  spaces.  Many  results  are  derived 
for  infinite-dimensional  sample  spaces,  but  these  are  always  derived  by 
valid  limiting  processes  from  finite-dimensional  approximations. 
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observation  will  be  represented  as 


S 


(e) 


X  +  6 


(2) 


where 


equivalent  observation  error. 


As  stated,  the  observed  value  of  the  random  variable  in  any 

case  is  assumed  to  be  x,  so  the  value  of  the  equivalent  observation 

error  is  x  -  x.  Here,  x  may  be  a  multi-component  vector  (Xj^,  ^2, 

(e) 

The  random  variable  S  is  assumed  to  have  pdf  characterized 
by  the  conditions: 


£<6)  =  X  (3) 

(e)  — 

pdf  of  6  s'  ■'  *  a-priori  p.d.f.  of  x  -  x 


This  amounts  to  saying 

[s<*)  1  .]  -  t[s<'>  -  x]  .  (x  -  S<*)] 

(c) 

In  this  equivalent  problem,  we  regard  x  as  a  constant,  and  S 

as  a  random  variable  with  pdf  given  by  (4).  It  is  also  assumed  that 

(e) 

the  equivalent  random  variable  6  S'  is  statistically  independent 
of  S. 

Now,  the  maximum  likelihood  estimate  of  x  for  this  equivalent 
problem  is  obtained  as  follows.  The  likelihood  function  is 
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p[s.  I  xj  =  p[s  1  x]  I  x]  (5) 

^  p(x  I  S)  p<'^  1  xj  q(S) 

p(x) 

where  q(S)  =  I  ?(S,  x)  d  x 

The  maximum  likelihood  estimate  of  x  is  obtained  by  substituting 

(e) 

the  observed  values  of  S  and  S  into  (5)  and  then  maximizing  with 

(e)  - 

respect  to  x.  However,  the  observed  value  of  S  is  x.  Thus,  when 
-  (e) 

X  is  substituted  for  S  in  (5),  and  use  is  then  made  of  (4)  and 
(1),  we  find  that  the  maximum  likelihood  estimate  of  x  is  obtained 
by  maximizing  p(x  |  S)  and  is  thus  equal  to  the  MAP  estimate  for  the 
original  problem. 

We  can  also  define  mixed  ML-MAP  estimates  for  the  case  where 
some  but  not  all  of  the  unknown  parameters  have  a-priori  statistics. 
Suppose  we  write 

X  =  (u,  v)  (6) 

(each  of  u  and  v  may  be  multi-dimensional). 

Also  suppose  that  an  a-priori  pdf  p(u  |  v)  is  associated  with 
u  (possibly  the  a-priori  pdf  of  u  depends  on  v  as  a  parameter)  but 
no  a-priori  statistics  are  associated  with  v.  Then,  the  mixed 


ML-MAP  estimate  of  x  is  defined  to  be 
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\ 


X  =  ML-MAP  estimate  of  x  (7) 

=  value  which  maximized  p(  S  |  x)  p(u  |  v) 

We  can  define  x  also  as  a  pure  MAP  estimate  by  associating 
with  V  an  a-priori  pdf  p(v)  which  is  uniform  over  an  extremely  wide 
range  of  values.  If  the  resulting  pdf  for  (u,  v)  =  x  is  then 
denoted  p(u,  v)  =  p(x),  then  x  is  the  value  which  maximizes 
p(S  I  x)  pCx"). 

X  can  also  be  represented  as  a  pure  ML  estimate  by  considering 

the  a-priori  statistics  associated  with  u  to  be  equivalent  to 

(e) 

providing  an  additional  "virtual"  observation  S  whose  observed 
value  is  u  and  whose  pdf  is  given  by 


E<') 

■s<«) 

1  x]  =  u 

(8) 

p(^) 

1  x~|  =  -  u  1 

v] 

(9) 

«  f[u  -  1 

where  it  has  been  assumed 

that  p(u  1  v)  is  of 

the  form 

P(u  1 

v)  =  1 

ij^u  -  u  1  vj  “  fj^u  - 

u  1  v] 

(10) 
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X  =  (u,  v)  is  then  obtained  by  maximizing,  with  respect  to 
(u,  v) ,  the  quantity 

[s,  I  '"I  =  P<S  I  v)  p^^^  I  u,  v]  (11) 

-  (e) 

with  the  value  u  substituted  for  S  and  the  actual  observational 
data  substituted  for  S. 

In  the  rest  of  this  paper,  when  we  adjoin  equivalent  observational 
data,  in  the  above-described  manner,  to  the  actual  observed  data,  in 
order  to  obtain  an  ML  estimate  equal  to  the  original  MAP  or  mixed 
ML-MAP  estimate,  we  will  call  these  "virtual"  or  "equivalent" 
observations  as  distinguished  from  the  "actual"  observations.  Such 
"equivalent  observations"  may  be  regarded  as  the  parameter  estimates 
which  would  be  made  if  there  were  no  actual  observed  data  but  only  a- 
priori  statistics  for  the  parameters;  they  will  generally  be  denoted 
by  a  superscript  "e".  The  author  has  found  the  concept  of  virtual 
observations  a  convenient  way  of  looking  at  things  in  many  applications, 
but  that  is  all  that  is  claimed  for  it. 


II.  2.  Generalized  Least  Squares  Estimates 


Attention  will  now  be  turned  to  a  class  of  generalized  least 

(2,9) 

squares  estimates  analyzed  extensively  by  Swerling.  ' 

Suppose  the  observational  data  is  given  by 


S 


(12) 
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where 


S  =  observation  (a  real  scalar) 

|i  ' 


f  (:i,  t  )  =  f  (x^  ,  x^,  t  )  =  value  of  observation  in  the 


•  -I  : 

yi  1 


M- 


absence  of  observation  error 


th 


e,  =  observation  error  for  the  yi  observation 


X  =  (x^,  x^, 


.)  =  finite  or  infinite  set  of  unknown  parameters 
(real  scalars),  not  yet  assumed  to  have  a- 
priori  statistics. 

t  =  time  of  observation  (assumed  known) 

d' 


Consider  the  class  of  estimation  procedures  defined  as  follows: 
e  estimates  obtained  1 
(x^,  X2>  •■•)  quantity 


x^  are  estimates  obtained  by  minimizing,  with  respect  to 


Q  =  y  T1  Is  -  f  (x,  t  )1  fs  -  f  (x,  t  )1 

^  L  uu  L  d  d  d  J  L  ^  u'  ’  u'J 

d,u 


(13) 


where 


(T1  ) 
yu 


arbitrary  symmetric  positive  definite  matrix 
(not  necessarily  the  inverse  covariance 


matrix  of  {&^3l  below) 


Before  proceeding,  several  comments  should  be  made: 


a)  The  subscript  y  on  f  indicates  that  the  functional  dependence 

d 

of  the  observations  on  the  unknown  parameters  may  differ  for  each 

observation.  However,  f  is  assumed  Co  be  a  known  function.  (If  f 

d  d 

is  not  exactly  known,  this  can  still  be  represented  by  assuming  f^ 

to  be  known  and  compensating  for  this  assumption  by  adding  an 

(9) 

equivalent  observation  error.  ) 
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b)  The  functions  f  need  not  depend  explicitly  on  t  .  Also,  t  may 

M-  p,  11 

be  replaced  by  one  or  more  spatial  or  space-time  parameters,  or  for 

that  matter,  by  an  element  of  an  abstract  parameter  set.  However, 

for  each  observation,  t  (or  the  abstract  parameter  replacing  it)  is 

assumed  known.  If  in  a  practical  case  t  is  not  perfectly  known,  we 

can  still  assume  it  to  be  known  and  compensate  for  this  assumption 

by  adding  an  equivalent  observation  error. 

c)  In  most  cases  of  interest,  one  or  more  components  of  the 
observational  data  may  consist  of  functions  of  a  continuous  time 
(or  other)  parameter,  of  the  form 

S(t)  =  g(t,  x^,  x^,  ...)  +  e(t),  t  e(T^,  T^)  (14) 


In  such  a  case,  the  term  in  Q  contributed  by  such  observations 

must  be  understood  as  the  limiting  form  of  finite  sums.  Such  limiting 

forms  can  be  well-defined  and  are  often  expressible  as 
(124  14) 

integrals.  ’  ’  ’  (More  precisely,  this  applies  to  those  terms  of 
Q  remaining  after  terms  which  are  independent  of  x  are  subtracted.) 

For  example,  we  could  use  discrete  sampled  times  [t  ]  and  have 

=  S(l^)  (15) 


where  $  is  a  covariance  function  (not  necessarily  that  of  &(t);  see 
below) . 

The  corresponding  term  of  Q  is  the  limit  as  the  set  {t  }  becomes 
dense  in  (t^, 
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Alternatively  one  can  consider  the  observed  data  to  be  the 
coefficients  of  S(t)  with  respect  to  a  complete  orthonormal  set  of 
functions  {H'^(t)]  over  (t^,  T2): 


T 


2 


S 


S(t)  Y  (t)  dt 
M' 


(16) 


f  (x) 
M- 


g(t,  x)  Y  (t)  dt 
M' 


(17) 


(This  is  an  example  where  f  does  not  depend  explicitly  on  t  . ) 

If  fc(t)  is  considered  a  sample  function  of  a  random  process 
with  covariance  function  $,  then  a  convenient  choice  for  [y  ]  is  the 
set  of  orthonormal  eigenfunctions  associated  with  i  over  (Tj^,  ^2^’ 

With  this  understanding,  we  will  continue  to  write  Q  as  a 
double  sum. 

Now,  this  method  of  estimation  results  in  maximum  likelihood 
estimates  under  the  following  two  conditions: 


(£\are  jointly  Gaussian  random  variables. 
P' 


B.  [£  }  have  zero  means,  and  (B  )  is  the  inverse  of  the 

p,-*  ’  'pi/ 

covariance  matrix  of  the  variables  [ft  }. 

P* 

(For  continuous  time  data,  the  appropriate  limiting  statements  are 
understood. ) 

If  these  conditions  hold,  certain  statements  can  be  made  about 

2  optimality  of  the  resulting  estimates  in  the  sense  that  they  are 

’sidnimum  variance  unbiased  estimates,  either  precisely  or  asymptotical] v 

(see  below).  However,  it  is  worth  emphasizing  that  the  estimates  may 

still  be  good  ones  even  if  one  or  both  of  these  conditions  fail. 

For  example,  if  the  statistics  are  not  Gaussian,  but  condition 

B  is  satisfied,  the  variances  of  (x^  -  x^)  are  in  many  cases  not 

affected;  the  only  thing  affected  is  what  type  of  statements  can  be 

(2) 

made  about  the  optimality  of  the  estimates. 
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If  condition  B  is  not  satisfied,  there  will  in  general  be  some 
degradation  in  estimation  accuracy.  However,  it  may  be  desirable 
from  other  points  of  view  to  use  smoothing  matrices  (T]  )  not  satis- 

fying  condition  B.  For  example,  it  may  be  convenient  for  computational 
purposes  to  use  a  diagonal  smoothing  matrix  even  if  the  observation 
errors  are  correlated;  or,  one  may  not  know  the  error  covariances 
exactly.  The  increased  computational  convenience  may  be  worth  the 
decrease  in  accuracy.  In  any  event,  the  degradation  in  accuracy 
caused  by  failure  of  condition  B  can  usually  be  computed,  as  is 
described  in  the  section  of  Reference  2  entitled  "Mismatched  Processing 
of  Received  Signals."  Reference  2  also  treats  the  degradation  in 

estimation  accuracy  caused  if  the  functions  f  utilized  in  Q  do  not 

P- 

exactly  describe  the  true  dependence  of  the  obsen/ations  (in  the 
absence  of  observation  error)  on  the  parameters  x^,  X2,  .... 

Thus,  we  will  adopt  the  point  of  view  that  the  method  of  mini¬ 
mizing  Q  defines  estimates  which  may  be  good  estimates  whether  or  not 
conditions  A  and  B  are  satisfied;  in  the  special  case  where  they  are 
satisfied,  the  estimates  are  ML. 

If  the  errors  £  are  sufficiently  small,  the  estimates  x. 

^  ^(2  9) 

resulting  from  minimization  can  be  written  to  first  order  as  ; 


x^  -  x^  =  ^  higher  order  terms 


4  J 

J  u 


(x)  -  )  T1  ^  ^ - li- 

p,  u  J 


(20) 
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It  should  be  emphasized  that  Eqs.  (18)  -  (20)  depend  only  on 
the  definition  of  the  method  of  obtaining  by  minimizing  Q;  they 
do  not  depend  on  any  statistical  interpretation  of  the  quantities 
£  nor  on  whether  conditions  A  and  B  hold.  (In  Eqs.  (18)  -  (20)  and 
other  places  below,  as  will  be  clear  from  the  context,  x  =  (Xj^.x^,--.) 
denote  the  true  values  of  the  parameters.) 

If  condition  B  is  satisfied,  and  if  the  higher  order  terms  can 
be  neglected,  then 


l(5!i 


xi)rxj. 


9]  =  [ 


-1 

B(x)1 

-'ij 


(21) 


We  will  now  proceed  to  define  a  generalized  least  squares 
smoothing  technique  for  the  case  where  a-priori  statistics  are 
associated  with  some  subset  of  the  parameters  Xj^,  X2,  .... 

Let  us  suppose  that  I  is  a  subset  of  (1,2,...),  not  necessarily 
a  proper  subset.  Suppose  that  a  joint  a-priori  statistical 
distribution  is  associated  with  those  x^  for  i  e  I. 

We  will  assume  that  Eq.  (12)  defines  the  actual  observations. 
The  generalized  least  squares  estimates  are  obtained  by  adjoining 
to  Q  a  term  corresponding  to  the  "equivalent  observations"  provided 
by  the  a-priori  statistics. 

In  this  case,  Q  is  defined  by 


=  I  V]  K  - 


(22) 


+  1  hi  [\  ■  *k]  bi  -  h] 

k,£,el 


where 


is  an  arbitrary  symmetric  positive  definite  matrix 


x^,  k  e  I  are  arbitrary  constants 
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Thus,  each  component  of  x  which  is  associated  with  a-priori 
statistics  is  represented  by  an  "observation"  whose  effective  dependence 
on  X  is  given  by 


f^(x)  =  x^,  i  e  I 


(23) 


and  whose  "observed  value"  is  x.  . 

1 

Of  course,  it  is  clear  that  in  order  to  make  good  use  of  the 
a-priori  information,  and  x^  are  not  going  to  be  completely 
arbitrary;  in  fact,  x^  are  going  to  be  something  like  the  means  of 
the  a-priori  p  d  f's  of  x,  ,  and  §  are  going  to  be  related  to  the 

K 

a-priori  covariance  matrix  of  [x^  -  However,  for  present 

purposes  we  can  regard  any  deviation  of  from  the  inverse  co- 

variance  matrix  of  {x^  -  x^]  as  analogous  to  a  deviation  of 

from  the  inverse  covariance  matrix  of  [fc  }  in  the  case  of  the  actual 

observations;  and  any  deviation  of  from  the  means  of  the  a-priori 

distributions  as  analogous  to  a  deviation  of  the  functions  f  from 

M- 

those  which  truly  describe  tie  dependence  of  the  actual  observations 
on  X. 

We  wish  to  apply  Eqs.  (18)  -  (20)  to  obtain  the  first  order 

A 

dependence  of  x^  -  x^  on  and  [x^  -  Xj^],  k  c  I-  To  facilitate 

this,  we  introduce  the  following  notation: 


X  =  (u,  v) 
u  =  [x^],  i  e  I 

u  =  [x^],  i  e  I 

=  i  ♦  I 


V 


4. 
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Then,  to  first  order  in  and  (xj^  ‘  ^  ® 


kel 


(24) 


-1 


r*  r’  r 


Y 


i^i 


(x)  2,  ^ 


p.u  5  X . 


j  u 


ij 


(25) 


jel 


■1 

ij 


(26) 


B 


ij 


(x)  =2,  ^  _Ji - IL.  _v - ^ 


Viu  3  X, 


3  X 


+  § 


ij 


ti,u 


’ij 


if  i,  j  €  I 


0  if  i  ^  I  or  j  ^  I 


Now,  consider  the  following  conditions: 


A^:  [£^]  are  jointly  Gaussian;  and  [xj^3»  kel  have  an  a-priorl 


joint  Gaussian  pdf. 


(27) 


(28) 


B^:  have  means  zero;  {xj^]»  kel  have  means  x^;  and  [Xj^  -  Xj^}, 

kel  are  a-priori  uncorrelated  with  {£^3-  Also,  [&^3  have  inverse 
covariance  matrix  (^^y)»  [x^^  -  Xj^3»  kel  have  inverse  covariance 

matrix  Bor  the  sake  of  simplicity,  is  also  assumed  to 

be  independent  of  [Xjj^3»  i  ^  I- 

If  a'  and  are  satisfied,  x^  are  the  ML-MAF  estimates  (for  all  1). 


We  will  now  use  Eqs.  (24)  -  (28)  to  determine  the  covariance 

A  A  ~ 

matrix  Ej^(x^  -  x^)(x^  -  x^)  of  the  estimation  errors  supposing  that 
just  holds.  It  is  also  assumed  that  the  first  order  equation  (24) 
is  valid  (i.e.,  that  higher  order  terms  are  negligible). 

A 

The  covariance  matrix  of  the  estimation  errors  x^j^  -  x^  will  be 
computed  with  respect  to  the  statistical  ensemble  defined  by  the 
statistics  of  the  actual  observation  errors  }  and  the  a-priori 
statistics  of  [u^],  i  e  I.  This  can  be  done  simply  by  forming  the 
products  (x^  -  x^) (Xj  "  from  Eqs.  (24)  -  (28),  taking  expected 
values  with  respect  to  the  joint  p.d.f.  of  [£  ],  and  then  taking  the 
expected  value  with  respect  to  the  a-prlori  joint  p  d  f  of  iu^},  i  e  I. 

The  result  is  the  following; 


^[(Xi  -  x^)(Xj  -  Xj)J  =  j^B(u,  v)J 


An  Important  special  case  is  that  where  B  is  independent  of  x, 
in  which  case  the  right  side  of  Eq.  (29)  becomes  simply  B^^  .  Many 
Important  applications  fall  into  this  category. 

The  above  equations  also  clarify  the  sense  in  which  the  adjoining 
of  "equivalent"  observations  is  actually  equivalent  to  the  a-priori 
statistics . 

Let  the  expected  value  of  (x^^  -  x^)(Xj  -  x^),  given  that  the 
true  value  is  x,  with  respect  to  the  (fictitious)  statistical  ensemble 


defined  by  the  actual  observations  and  equivalent  observations,  with 
X  regarded  as  a  constant,  be  denoted  by  E^^^  |_^^i  '  1  ^  ‘ 
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Then,  from  Eq.  (21),  to  first  order. 


E 


(e) 


-  X^) (Xj  -  Xj) 


(30) 


Thus,  Eq.  (29)  says  that,  provided  the  first  order  expansion  (24) 
holds ,  for  (u^  -  u^^)  as  well  as  for 

E[(Xi  -  x^)(Xj  -  x^)J  =  -  x^)(x^  -  Xj)  1  u,  vj 


(31) 


It  is  also  true  that,  to  first  order. 


E^®^  (x^  -  x^)(Xj  -  Xj) 


(32) 


Xj) 


I  u.  v] 


p(u) 


du 


Thus,  provided  the  first  order  expansions  are  valid,  the 
"equivalence"  of  the  a-priori  statistics  to  the  adjoining  of 
"equivalent"  observations  applies  not  only  in  the  sense  that  the 
MAP  estimates  in  the  original  problem  are  equal  to  the  ML  estimates 
in  the  equivalent  problem  (which  holds  regardless  of  the  validity  of 
the  first  order  expansions),  but  also  in  that  the  estimation  error 
covariance  matrix  for  the  original  problem  can  be  derived  from  that 
for  the  equivalent  problem  via  Eq.  (31)  or  Eq.  (32). 

In  the  case  where  the  matrix  B  is  independent  of  x,  then  so  also 

'ft 

are  both  sides  of  Eq.  (31). 


The  expectations  in  Eqs .  (29),  (31),  and  (32)  are  conditional  on 
V  having  a  definite  value.  If  one  wishes  to  interpret  "no  a-priorl 
statistics"  as  implicitly  meaning  "uni form- density  a-priori  statistics," 
then  it  must  be  understood  that  these  expectations  are  not  taken  over 
these  uniform  a-priori  statistics  of  v  (unless  we  are  dealing  with  a 
case  where  B  is  independent  of  v) . 
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A 

Another  useful  form  of  the  first  order  equations  for  can  be 
stated  in  the  case  where  all  components  of  x  have  a-priori  statistics, 
i . c . ,  where  x  =  u. 

From  Eqs.  (20),  (21),  (23)  and  (25)  of  Ref.  9  one  can  state  that, 
to  first  order, 


I 


(x) 


(33) 


*  - 

P.  (x) 


M*  I  u 


a  f  (x,  t  ) 


a  X 


j 


r 

u 


(x, 


t  ) 
u 


(34) 


r 

V 


(x,  t  )  =  S 

u  u 


f  (x,  t  ) 

u  u 


(35) 


Attention  will  now  be  turned  to  the  statement  of  a  result 
according  to  which,  in  some  cases,  parameters  having  a-priori 
statistics  can  be  considered  equivalent  to  additive  noise,  insofar 
as  concerns  estimation  of  other  parameters;  or  conversely,  roise  can 
be  considered  to  be  represented  by  additional  parameters  to  be 
estimated. 

Our  initial  statement  of  this  result  can,  however,  be  stated  in 
a  form  which  does  not  involve  statistical  concepts: 

Let 

X  -  (u,  v)  (36) 

(Here,  the  notation  (u,v)  does  not  have  the  same  significance  as  before; 
a-priori  statistics  will  be  associated  with  some  components  of  u  and  with 
all  components  of  v) 
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Q(u.  V) 


=I 


P-,u 


S  -  g(u,  t)-h(v, 

L  |Ji  p.  p.  M- 


.  oJ 


(38) 


S  -  g  (u,  t  -  h  (v,  t  )"] 
L  u  ’  u'  u'  ’  y^J 


|J,U 


■'I  ■  “£>  *1  -  \)<''£  -  '’£> 


k,£ 


k,£ 


In  Eq.  (38),  the  sum  involving  terms 

extended  over  a  subset  (not  necessarily  proper)  of  the  indices  of 

[u^};  i.e.,  =  0  unless  both  k  and  t  belong  to  some  subset  I  of 

the  indices  of  u.  However,  is  assumed  to  be  positive  definite 

when  k,  £  are  restricted  to  I.  On  the  other  hand,  the  matrix  (^,  .) 

*  K£ 

is  assumed  to  be  positive  definite  with  k,  I  ranging  over  all  the 
indices  of  [vj^]. 

Suppose  that  the  estimates  u,  v  are  obtained  by  finding  those 
values  which  minimize  Q(u,  v)  with  respect  to  u  and  v. 

Now  consider 


Q  (u)  =  y  T)  Is  -  g  (u,  t  )  -  h  (v,  t  )1 

L  p.y  L  p>  p.  p,  p  J 


(39) 


Pi  U 


■"I  ^k£<\-V<“£-V 

k,i, 
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where 


* 


-  1  *  -  1 

(T1  +  $  ) 


(40) 


and 


*  =  I 

k/ 


.  ^  h  (v,  t  )  ^  h  (v,  t  ) 


S  V, 


-UL 


a  V, 


(41) 


i(  ic 

(In  Eq.  (40),  Ti  ,  T],  and  $  are  matrices.) 

**  * 

Let  u  be  the  estimate  of  u  obtained  by  minimizing  Q  (u)  with 

respect  to  u.  Then, 


A* 

u 


to  first  order 


(42) 


It  is  to  be  noted  that  the  result  stated  in  Eqs.  (36)  -  (42) 
has  been  stated,  and  can  be  proved,  entirely  without  recourse  to 
statistical  concepts  or  interpretations.  Before  outlining  the 
proof,  however,  the  statistical  motivation  will  be  described: 

Suppose  we  now  regard  [fi  j  as  a  random  process  with  means  zero 
and  inverse  covariance  matrix  (T|  )  and  suppose 


(a)  a-priori  statistics  are  associated  with  seme  of 
of  u,  having  means  u,  and  inverse  covariance  matrix 


the  components 


(b)  a-priori  statistics  are  associated  with  all  of  the  components 
of  V,  leaving  means  Vj^  and  inverse  covariance  matrix  (^j^^) 

(c)  [C  },  statistically  uncorrelated 
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Write  the  actual  observations  as 


S 

M- 


f  (u,v,  t)+£  =g  (u,  t)+h(v,  t)+£ 


(A3) 


r  -  ^ 

g  (u ,  t  )  +  h  (v ,  t  )  +  £  ■*■  /  (v,  -  V  )  - — tL. 

'’IJ.  p.  p  p  p  L  k  k 

k 


S  V. 


_  _  d  h  (v,  t  ) 

If  we  regard  £  +  )  (v,  -  v  )  - =— 

P  ^  k  k  ^  V 


as  the  "noise",  then  this  noise  will  be  a  random  process  with  mean 

* 

zero  and  inverse  covariance  matrix  T1  •  Consequently,  Eq.  (42)  can 
be  interpreted  as  follows: 

Suppose  the  original  problem,  in  which  u  and  v  are  to  be  jointly 
estimated,  is  replaced  by  another  problem  in  which  u  only  is  to  be 
estimated;  v  is  eliminated  from  the  problem,  and  also  the  virtual 
observations  equivalent  to  the  a-priori  statistics  of  v  are  eliminated. 
The  virtual  observations  associated  with  u  are  retained  unaltered. 

The  actual  observations  are  replaced  '  y 


S 


f*(u,  t  )  +  e* 

p  p  p 


(44) 


where 


★ 

f 


(u, 


g  (u,  t  )  +  h  (v, 

®p  p'  p'  ’ 


(45) 


and  [£  ]  is  a  random  process  with  zero  means  and  inverse  covariance 

P- 

★ 

matrix  1\  . 
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A  " 

u  is  the  estimate  for  u  obtained  In  this  second  problem  (i.e., 
the  ML-MAP  estimate  if  conditions  a'  and  hold). 

Equation  (42)  states  that,  to  first  order,  if  and  hold, 
the  ML-MAP  estimate  u  in  the  first  problem  is  equal  to  the  ML-MAP 

A* 

estimate  u  in  the  second  problem. 

The  proof  involves  some  rather  tedious  matrix  algebra.  An 
outline  is  as  follows: 

a)  If  the  functions  g  and  h  are  reasonably  well-behaved,  the 
estimate  u  can  be  obtained  as  follc««7s  (assuming  we  can  effectively 
restrict  the  problem  to  the  immediate  neighborhood  of  u,  v) : 

First  find,  for  any  fixed  u,  the  value  v(u)  which  minimizes 
Q(u,  v)  with  respect  to  v. 

Then  let 

Q^(u)  =  q[^u,  v(u)J 

Then,  u  is  that  value  which  minimizes  Q^(u). 

The  proof  of  Eq.  (42)  then  consists  in  verifying  that,  to  first 
.  * 

order,  Q  (u)  =  Q  (u). 

II. 3-  The  Linear  Case  with  an  Infinite  Set  of  Parameters 

The  primary  aim  of  this  subsection  is  to  provide  an  example  of 
the  foregoing  analysis  by  applying  the  results  of  Section  II.  2  to 
the  lin>'ar  case  with  an  infinity  of  unknown  parameters;  as  will  be 
seen,  this  amounts  to  another  way  of  looking  at  standard  linear 
minimum  mean  square  error  filtering  theory.  First,  brief  discussions 
will  be  given  of  linearity  vs.  non-linearity,  and  of  finite  vs. 


(46) 


infinite  parameter  sets. 
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The  Linear  Case 

We  will  define  the  linear  case  to  be  that  case  in  which  the 
functions  f^  depend  linearly  on  the  unknown  parameters.  In  general, 
this  may  be  written 


f  (x,  t  )  =  y  X  g  (t  )  +  h(t  ) 
M-  P-  Li  p-i  p.  p 


(47) 


v/here  g  (c)  and  h(t)  are  known  functions. 

pi 

It  should  be  noted  that  the  virtual  observations  which  are 

equivalent  to  a-priori  statistics  are  of  the  form  given  by  Eq.  (1) 

and  are  therefore  automatically  in  the  linear  form.  Thus,  if  the 

functions  f  describing  the  actual  observations  satisfy  Eq.  (47),  then 
P 

all  the  observations,  both  actual  and  virtual,  are  of  the  linear  form. 
In  such  a  case,  the  following  statements  can  be  made: 


(a)  The  estimates  [x^],  if  conditions  and  are  satisfied,  are 
exact  minimum  variance  unbiased  estimates.  If  only  B'  is  satisfied, 
they  are  still  exact  minimum  variance  unbiased  linear  estimates. 

In  the  case  of  non-linear  dependence  of  f^  on  the  parameters, 
one  can  still  in  many  cases  say  that  the  estimates  [x^]  are 
asymptotically  minimum  variance ;  this  is  discussed  further  shortly. 


(b)  All  the  results  obtained  "to  first  order"  in  Section  II.  2  can 
now  be  said  to  hold  exactly  without  restriction  on  the  magnitudes 
of  of  [xj^  -  x^].  Also,  the  matrices  B  and  partial  deriva¬ 

tives  d  f^  I  3  x^  are  independent  of  the  values  of  [x^). 
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In  the  non-linear  case,  for  sufficiently  small  values  of  [g  } 

and  of  [x^  -  x^],  the  problem  can  be  linearized  and  the  results 

obtained  to  first  order  will  be  approximately  correct. 

Actually,  in  many  cases,  this  linearization  will  lead  to  correct 

results  even  in  cases  where  the  individual  values  of  [g  ]  and 

[Xf  -  x^]  are  fairly  large,  provided  the  total  (integrated)  signal- 

(2) 

to-noise  ratio,  in  some  appropriately  defined  sense,  is  large. 
However,  there  are  some  subtle  pitfalls  connected  with  determining 
the  requirements  on  output  signal- to-noise  ratio  in  order  to  ensure 
that  the  results  obtained  from  the  linearized  problem  are  correct. 
This  is  discussed  at  some  length  in  Ref.  2. 

One  can  give  the  following  heuristic  condition  for  the  signal- 
to-noise  ratio  required  in  order  that  the  solutions  obtained  from 
the  linearized  problem  be  approximately  correct. 

Suppose  that  the  true  parameter  values  are  denoted  by  [x^^],  and 
that  there  exists  a  region  R  containing  x  =  [x^]  such  that,  for  all 
x^,  y."  in  R,  and  all  p,. 


f  (x”, 


d  f  (x\  t  ) 

p  p 

d  X . 


(48) 


+  remainder 


where  the  remainder  term  is  negligible  within  R. 

Also  suppose  that  the  output  signal - to-noise  ratio  is  sufficiently 
high  so  that,  with  probability  approaching  unity,  a  preliminary 

estimate  x  can  be  obtained  with  x  e  R. 

o  o 
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Then,  with  probability  equal  essentially  to  unity,  one  can 
replace  the  original  problem  with  the  linearized  problem  in  which 
the  observations  are  replaced  by  S  -  f  (x  ,  t  ) ;  the  parameters  to 

p,  p  o  P 

be  estimated  are  replaced  by  [xj  -  x  and  the  functions  f  are 

i  o  1  ■”  p 

replaced  by 


■k 

f 

P 


d  f  (x 

p  o 


t  ) 
_k!^ 


B  x. 


(A9) 


Thus,  the  condition  is  that  the  signal- to~noise  ratio  be 

sufficiently  high  that  the  problem  can,  with  probability  essentially 

equal  to  unity,  be  confined  to  a  region  R  around  x  where  the 

variation  of  f  with  [x  }  is  linear  except  for  a  negligible  remainder, 
p  i 

Finite  vs.  Infinite  Parameter  Sets 

In  typical  cases,  the  estimation  errors  due  to  observation 
errors  in  least- squares  smoothing  methods  of  the  kind  under 
discussion  increase  as  the  number  of  parameters  to  be  estimated 
increases-  In  many  cases,  as  the  number  of  parameters  to  be  estimated 
approaches  infinity,  the  estimation  errors  become  equal  to  the 
observation  errors  so  that  all  smoothing  is  lost. 

For  example,  suppose 


S(t)  -  X  (t)  +  e(t) 


(50) 
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where  x(t)  is  some  function  of  time,  the  signal,  to  be  estimated,  and 
e(t)  is  the  noise.  If  we  allow  x(t)  to  be  represented  by  a  countable 
infinity  of  parameters  without  a-priori  statistics,  and  then  apply 
generalized  least-squares  smoothing,  the  resulting  estimates  are 

x(t)  =  S(t)  (51) 

and  the  estimation  error  is  just  fi(t). 

Ordinarily,  one  gets  smoothing  by  fitting  x(t)  by  a  set  of 
functions  depending  on  a  small  number  of  parameters,  such  as  low- 
order  polynomials  or  trigonometric  series.  This  reduces  the 
estimation  errors  due  to  observation  noise,  but  if  the  actual 
functions  x(t)  do  not  belong  precisely  to  the  set  of  functions  used 
in  the  fitting  procedure,  another  kind  of  error  is  introduced  which 
is  sometimes  called  "bias  error"  (although  it  has  nothing  to  do  with 
biases  in  the  observation  errors). 

Usually,  the  procedure  is  "optimized"  by  choosing  the  number 
of  parameters,  e.g.,  the  order  of  the  polynomial  or  the  number  of 
terms  in  the  trigonometric  series,  so  that  the  sum  of  the  "bias" 
errors  and  the  errors  due  to  observation  noise  is  minimized.  TTiis 
"optimization"  is  facilitated  if  one  has  some  sort  of  a-priori 
knowledge  as  to  how  closely  the  functions  x(t)  which  actually 
characterize  the  observations  can  be  approximated  by  functions  belong¬ 
ing  to  the  set  used  to  fit  the  observations. 

According  to  the  viewpoint  adopted  here,  smoothing  can  be  re¬ 
tained  even  though  x(t)  continues  to  be  represented  by  an  infinite, 
(countable)  parameter  set,  provided  these  parameters  are  given  a 
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joint  a-prlori  statistical  distribution.  This  is  equivalent  to 
adding  an  infinite  set  of  virtual  observations  which  is  sufficient  to 
retain  full  smoothing  even  with  an  infinite  number  of  parameters. 

Of  course,  another  way  of  looking  at  it  would  be  that  this  is 
equivalent  to  regarding  x(t)  as  a  random  process,  and  is  in  fact 
just  another  way  of  interpreting  the  standard  linear  optimum 
filtering  in  which  the  signal  as  well  as  the  noise  is  regarded  as  a 
random  prociiss. 

This  equivalence  will  be  made  explicit  in  a  moment.  However, 
it  would  be  well  to  mention,  at  this  point,  that  examples  can  be 
found  of  problems  in  which  there  are  an  infinite  number  of  unknown 
parameters,  but  in  which  smoothing  can  be  obtained  without  associating 
a-priori  statistics  with  any  of  the  unknown  parameters.  An  example 
of  this  sort  is  given  in  the  appendix. 

To  make  the  above-described  interpretation  of  linear  optimum 
filtering  explicit,  suppose 

S(t)  =  x(t)  +  £(t),  i  t  S  T2  (52) 

where  x(t)  and  e,(t)  are  sample  functions  of  random  processes  lx(t)}, 
[e(t)].  It  is  assumed  that  one  knows  a-priori  that  [x(t)}  is 
defined  and  continuous  in  the  mean  over  an  interval  (T^ ,  T2)  contain¬ 
ing  (Tj^,  t^)  ,  with  zero  mean  and  covariance  function  t)  ;  while 

[6(t)}  is  defined  and  continuous  in  the  mean  over  (t^^,  with  zero 

mean  and  covariance  function  f  (s,  t) .  [x(t)}  and  i£(t)}  are  assumed 

to  be  statistically  uncorrelated. 
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Now,  let  x(t)  be  the  estimate  of  x(t)  obtained  from  standard 
linear  least  squares  theory  for  t  e(Tp  "^2^’  ^  T2)  we 

have  a  true  filtering  or  interpolation  problem,  otherwise  an  extrapola¬ 
tion  problem.) 

Now,  consider  the  following  equivalent  problem.  Suppose  the 
actual  observations  are  S(t),  =  t  =  T^.  Also  suppose  the  virtual 

observations  are  defined  as  follows: 

x(t)  =  "observed  value"  of  S^^^(t)  =0,  =  t  =  T2  (53) 

(since  our  assumption  is  that  (x(t)]  is  a  zero  mean  process). 

Virtual  observation  error  =  {£(t)1,  =  t  =  T2  (5^) 

where  [g  (t)}  is  a  zero  mean  random  process  with  covariance  function 
$  (s,  t) .  (The  actual  observation  error  is  the  same  as  before.) 

A* 

Let  X  (t)  be  the  generalized  least  squares  estimate  obtained 
for  this  equivalent  version  of  the  problem,  t  e(T^,  12).  Tlien, 

* 

X  (t)  =  x(t)  (55) 

This  is  actually  a  consequence  of  the  results  previously  proved; 

however,  a  direct  verification  is  possible.  This  can  most  simply  be 

obtained  by  considering  the  discrete- time  case  in  which  t  is 

restricted  to  discrete  values  It  }.  The  estimates  for  the  continuous 

time  parameter  can  be  obtained  by  a  limiting  process  from  the 

1  (2,  1^) 

discrete- time  results. 
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The  estimation  process  for  the  equivalent  least  squares  smooth¬ 
ing  technique  takes  the  following  form:  let 


S(t„)  - 


(56) 


+ y  c ,  x(t )  x(t ) 

PL.U 


where 


T1  = 


(57) 


c  = 


The  parameters  to  be  estimated  are  x  =  x(t  ),  t  e(T,  ,  T_). 

|j.  p,  p,  1  2 

The  second  sura  in  Eq.  (56)  is  extended  over  the  whole  Interval 

(T,  ,  T„),  while  the  first  sum  is  extended  over  only  those  t  in 
1  Z  l-t 

(Ti,  T2),  as  indicated  by  the  prime  symbol.  The  parameter  estimates 

A* 

X  (t  )  are  obtained  by  minimizing  Q(x)  with  respect  to  x(t  ). 
p<  p. 

A 

The  fact  that  Eq.  (55)  holds,  where  x(t  )  are  the  estimates 

P' 

resulting  from  standard  linear  least  squares  filtering  theory,  can 

be  verified  directly  by  minimizing  Q(x)  with  respect  to  x(t  )  by 

P 

setting  the  partials  (d  Q  |  3  x  )  =  0,  and  comparing  the  results 

P 

with  the  standard  formulas  for  x(t^),  as  for  example  given  in  Ref.  14. 
As  a  specific  example,  suppose  (T^^,  T2)  =  (Tj^,  estimates 

x(t  )  resulting  from  minimization  of  Q  in  Eq.  (56)  are  given  (in  vector 

P 

notation)  by 
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X  =  (i  +  C)  s 


(58) 


which  is  easily  shown  to  be  identical  to  tht  more  familiar  form 

-1 


(14) 


x  =  C^(ti^+CS  S 


(58a) 


Incidentally,  insofar  as  concerns  estimation  of  any  particular 

value  x(t  ),  t  e(Ti,  T.),  the  estimate  x  (t  )  requires  only  that 

^  o  i  2  o 

the  interval  over  which  the  random  process  [x(t)}  is  defined  contain 

.  A* 

(■^ ,  ,  T  )  and  the  point  t  ;  x  (t  )  will  be  independent  of  whether 
[x(t)]  is  actually  defined  over  the  full  interval  (T^,  T2)  if  the 
latter  is  larger  than  (t^,  T2)-  This  is  also,  of  course,  true  for 
x(t  ). 

Another  result  which  is  a  direct  consequence  of  the  results 
stated  at  the  end  of  Section  II.  2  is  as  follows: 

Let 


S(t) 


n 


(59) 


where  x^^^t)  are  sample  functions  from  random  processes  [x^^\t)} 
which  have  zero  means,  covariance  functions  t)  and  are 

mutually  uncorrelated. 

Suppose  the  indices  i  =  1,  ...,  n  are  divided  into  two  sets, 

I  and  I^,  and  that  the  processes  [x^^^(t)}  for  i  e  I  are  regarded 
as  "signals",  while  the  "noise"  is  given  by 
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£(t)  =  ^  (60) 

iel' 


Then, 


the 


estimate 


for  a  particular  index  i 


i  G  I 


is  independent  of  I  - 
x^^^(t)  for  i  =  i  is 


(i)/  N 

processes  x  ( t) , 


i  .  That  is,  so  long  as  i  el,  the  estimate 

O  O 

the  same  regardless  of  which  of  the  remaining 
-  i  ,  are  considered  as  signals  to  be  jointly 


O 

estimated,  and  which  are  lumped  into  the  noise. 

We  can  even  go  to  the  extreme  of  considering  all  of  the  processes 
i  =  1,  . . . ,  n,  as  signals  to  be  jointly  estimated.  In  the  equivalent 
generalized  least  squares  formulation,  the  "actual"  observations 
would  then  be  considered  to  be  error-free  and  the  only  errors  would 
be  associated  with  the  "virtual"  observations. 


t 


I 

% 
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III.  Recursive  Solutions  of  Signal  Estimation  Problems 

III.  1.  Preliminary  Discussion 

Suppose  one  has  a  signal  vector  x  =  (x^  ...  ,  .  which  may  be 

a  function  of  time;  observational  data  which  depend  on  the  values 

of  the  vector  x;  and  additive  observation  errors.  Recursive  methods 

for  producing  the  generalized  least  squares  estimate  of  (x^j  at  any 

(9  10^  ^2)  (12) 

time  t  have  been  studied  by  Swerling,  ’  Kalman,  ’  Bucy, 

(13) 

Blum,  and  others.  These  solutions  have  the  feature  that  optimum 

estimates  based  on  previous  data  are  combined  with  additional 
observational  data  in  an  optimum  way  to  produce  new  optimum  estimates. 

Swerling^^’  treated  initially  the  case  (either  linear  or 

non-linear)  where  the  vector  x  is  constant;  then  the  modified  case 
where  x  may  depend  on  time  but  where  the  variation  of  x  with  time 
has  known  functional  form;  and  finally  the  case  where  the  variation 
of  X  with  time  has  a  component  of  unknown  functional  form,  but 
without  associating  a-priori  statistics  with  the  unknown  components  of 

the  time  variation  of  x. 

Kalman and  Bucy^^^^  treat  the  linear  case, and  also  give 

the  extension  to  the  case  where  x  is  regarded  as  a  random  process, 

with  essentially  the  assumption  that  both  the  signal  x  and  the 

observation  noise  are  projections  of  vector  Markov  processes. 

(13) 

Blum  generalizes  these  recursive  methods,  in  a  manner 
somewhat  different  from  the  other  papers  mentioned,  to  cases  where 


the  observation  errors  are  correlated. 
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It  is  the  purpose  of  this  section  to  exhibit  recursive  solutions 
which  yield  the  linear  optimuin  estimates  for  the  case  where  x  is  a 
random  process,  with  very  few  restrictive  assumptions  on  the  statistics 
of  X  or  of  the  noise  process.  Essentially,  the  signal  processes  are 
assumed  to  be  continuous  in  the  mean;  the  noise  process  is  assumed  to 
consist  of  a  component  which  is  continuous  in  the  mean  and  a  white  noise 
component;  and  that  is  all.  Section  111.  2  treats  the  linear  case  with 
additive  noise;  and  Section  III.  3  gives  the  "first  order"  treatment  of 
the  non-linear  case,  where  some  of  the  noise  may  be  non-additive. 

The  method  of  approach  is  as  follows:  all  problems  of  this 
type  are  reduced  to  an  equivalent  problem  in  which 

(a)  There  is  a  (possibly)  infinite  set  of  parameters  to  be  estimated 

(b)  The  parameters  to  be  estimated  are  regarded  as  constants  (inde¬ 
pendent  of  time) 

(c)  A-priori  statistics  are  associated  with  the  parameters  to  be 
estimated  and  are  represented  in  the  generalized  least  squares 
procedure  in  the  form  of  equivalent  virtual  observations. 

(d)  The  observation  errors  are  regarded  as  uncorrelated,  i.e.,  have 
covariance  function  i(s,  t)  =  R(t)  6(s  -  t).  This  is 
accomplished  by  regarding  everything  except  the  "white  noise" 
component  of  the  observation  error  as  represented  by  parameters 
to  be  estimated. 

When  the  problem  has  been  reduced  to  the  form  described  by 

(9) 

(a)  -  (d) ,  formulas  of  Swerling  can  be  applied  directly.  The 
requisite  formulas  will  be  reproduced  here,  in  the  form  applicable 
to  the  discrete-time  case.  The  result  for  a  continuous  time  parameter 
is  then  obtained  by  a  limiting  process. 
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Suppose  we  have  observed  data  given  by  Eq.  (12)  above.  Suppose 

the  matrix  (Tj  )  can  be  broken  up  into  blocks.  The  fundamental 

(9) 

result  of  Swerling  is  that  a  recursive  procedure  can  be  set  up, 

in  which  the  observational  data  corresponding  to  each  block  of  T]  is 

treated  as  a  separate  stage;  at  each  stage,  say  the  a  generalized 

til 

least  squares  smoothing  of  the  observation,  data  in  the  s  stage, 

together  with  the  estimates  based  on  all  previous  stages,  is  defined. 

The  basic  result  is  that  this  sequence  of  generalized  least  'Squares 

estimates  can  be  defined  in  such  a  way  that  the  resulting  estimates 

(say,  after  the  s^^  stage)  are,  tc  first  order,  identical  with  those 

resulting  from  the  non-recursive  smoothing  of  all  "s”  stages  using 

the  original  matrix  H.  In  the  linear  case,  the  qualifying  phrase 

"to  first  order"  can  be  dropped.  The  specific  form  of  the  necessary 

recursive  sequence  is  exhibited. 

The  results  assume  a  particularly  simple  form  if  the  original 

smoothing  matrix  Tj  is  diagonal  (or  at  least  is  diagonal  after  some 

point).  In  this  case,  each  observation  S  can  be  considered  a 

Vi 

separate  stage.  We  will  refer  to  this  as  introducing  the  observations 
one -by -one . 

Before  exhibiting  the  formulas  necessary  for  the  ensuing  applica¬ 
tion,  a  few  comments  are  in  order  about  the  interpretation  of  these 
stagewise  or  recursive  procedures.  The  most  important  comment  is 

(9) 

that  Swerling's  basic  result'  need  not  be  interpreted  statistically, 
that  is,  it  can  be  stated  and  proved  without  recourse  to  statistical 
notions;  it  holds  regardless  of  the  statistics  of  the  observation  error, 
and  in  particular,  of  whether  the  basic  smoothing  matrix  q  is 
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or  is  not  the  inverse  covariance  matrix  of  the  observation  errors. 

Consequently,  even  if  the  errors  are  correlated,  one  is  still  at 

liberty  to  use  a  diagonal  and  thus  to  use  one-by-one  recursive 

methods,  although  the  result  may  not  be  statistically  optimum. 

Although  the  basic  result  need  not  be  interpreted  statistically, 

all  the  usual  statistical  consequences  can  be  derived  from  it  when 

specific  statistics  are  associated  with  the  observation  errors.  For 

example,  if  conditions  A  and  B  of  Section  II  hold,  the  result  of  the 

non~recursive  method  are  ML  estimates,  and  consequently  the  result  of 

the  recursive  method  are  to  first  order  the  ML  estimates. 

If  the  original  weighting  matrix  T]  is  not  the  inverse  covariance 

matrix  of  the  observation  errors,  then  the  accuracy  of  the  resulting 

estimates  will  be  degraded  (the  estimates  will  not  be  statistically 

(2) 

optimum) ;  the  amount  of  accuracy  degradation  can  be  computed  and 

may  be  an  acceptable  price  to  pay,  for  example,  for  the  computational 

convenience  of  using  a  diagonal  matrix  T)- 
(13) 

Blum  considers  recursive  estimation  procedures  that  can  be 
applied  without  loss  of  statistical  optimality  to  certain  cases 
where  the  observation  errors  are  correlated  and  where,  in  fact, 
there  is  no  way  to  break  up  their  covariance  matrix  into  blocks. 

His  approach  is  to  assume  that  the  observation  errors  satisfy  a 
non-homogeneous  difference  equation  of  order,  say,  k;  his  recursive 
procedure  then  Involves  the  previous  k  +  1  estimates  instead  of  just 
the  last  estimate. 

The  approach  to  be  followed  here  (summarized  above  in  statements 
a  -  d)  is  applicable  to  either  correlated  or  uncorrelated  observation 
errors,  and  results  in  all  cases  in  statistically  optimum  estimates 
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(under  the  appropriate  conditions  A'  and  B').  By  employing  step  (d) , 
regarding  the  correlated  part  of  the  errors  as  parameters  to  be  esti¬ 
mated,  the  problem  is  reduced  to  one  in  which  a  diagonal  t]  may  be  used 
without  loss  of  statistical  optimality.  Also,  by  reducing  the  entire 
problem  to  an  equivalent  one  where  the  parameters  all  appear  as  non¬ 
stochastic,  it  is  possible  to  dispense  with  any  assumption  that  either 
signal  or  observation  noise  are  projections  of  Markov  processes.  Alsc  , 
no  matrix  inversions  arc  required  at  any  stage  of  the  procedure.  The 
necessary  equations  arc  equivalent  to  Eqs.  (23),  (25),  (A6) ,  (47),  (48) 
of  Ref.  9. 

Thus,  suppose  the  matrix  is  diagonal: 


Let  X  =  (x,  ,  ...,  X  )  where  x^  are  constants.  Denote  by  x, (s)  the 
1  ra  i  i 

estimate  of  x^  based  on  the  first  s  observations.  (We  will  also 
assume  that  there  is  some  initial  estimate  x^(o)  but  need  not  specify 
at  this  point  just  how  this  is  obtained.) 

Tlien,  the  one-by-one  recursive  procedure  is  defined  as  follows, 


where  x(s)  =  x,  (s)  ,  . . . ,  x  (s)  : 

Li  ni 

x(s)  -  x(s  -  1)  =  -  l)j  -  1) 


d{j'[x  J  dj^hx) 
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tn 


k=l 


ik 


(x)} 


(64) 
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r  1%  ^  f  (x.T  )  ^  f  (x,T  )\ 


B  X 


J.k-1 


*  r  « 

P^  |^x(s  -  1) 


=  'H. 


B  fg[x(S  -  1) 


3  X, 


"s 


}  (65) 


—  (s ) 

In  the  above,  x,  p  ,  and  d  are  m-component  vectors;  D  is  an 

ra  X  m  matrix;  t  is  the  time  of  the  s^^  observation, 
s 

The  initial  values  of  x  and  D,  i.e.,  x(o)  and  have  not 

been  defined;  but  this  will  be  done  when  the  intended  application 
is  made. 

In  the  linear  case,  the  matrices  D'  '  and  the  partials  B  f'  '/  B  x^ 
are  independent  of  the  values  of  x  or  x(s). 

III.  2.  Application  to  the  Linear  Case 

We  will  assume  that  the  observational  data  up  to  any  time 
instant  t  is 


n- 1 

U^^^t)  +  fc*(t),  ^  t  <  T 


S(t) 


(66) 
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where  ^(t)  is  a  sample  function  of  a  random  process  ( t )  j-. 

The  processes  j^u^^^(t)|  will  be  interpreted  as  signal  processes  and 
*  1 

3  noise  process. 

It  will  become  obvious  that  the  same  technique  may  be  applied 
to  more  general  situations  in  which  there  is  not  one,  but  say,  J 
received  signals,  each  containing  a  different  linear  combination  of 
the  random  processes  [u^^\t)]: 


S.(t) 


=  ^  a  (t)  +  e*(t) 


(66a) 


with  j  =  1,  ...,  J.  The  method  of  doing  this  will  be  outlined  below. 
The  following  assumptions  will  be  made  about  these  random 


processes : 


There  is  some  basic  interval  (T^ ,  T2)  containing  (t^,  t)  within 
which  it  is  known  a-priori  that: 

(a)  i  =  l,  ...,n-l,  are  random  processes  which  are 

mutually  uncorrelated;  have  zero  means;  and  are  continuous  in  the 
mean  with  covariance  function  t^). 

(b)  \e,  (t:)|  is  a  random  process  which  is  uncorrelated  with  all  the 
processes  l^u^^^t)!,  and  which  can  be  written 


and  which  can  be  written 


-  e*(t)]-  = 


(67) 
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where 

and: 

(1) 

t'). 

(ii) 

{e(t)} 

and  mutually  uncorrelated,  zero  mean, 


is  continuous  In  the  mean  with  covariance  function 


The  statement  about  j^e.(t)j-  can  be  interpreted  in  various 
alternative  ways: 

1-  has  covariance  function  R(t)  6(t  -  t') 

2.  If  we  define  (more  rigorously)  another  random  process  by 

t+At 

0  (At  ,  t)  «  f  e(T)  d  T 

then  in  the  neighborhood  of  t,  0(At,  t)  has  covariance  function 

E^0(At,  t)  0(At^,  t)  =  R(t)  min(At,  At^) 

3.  If  R(t)  =  constant,  then  j^6(t)|  has  one-sided  spectral  density 


N  =  2  R 

o 


(70) 


It  will  be  assumed  that  for  t  e(Tj^,  T2),  R(t)  is  positive 
and  bounded  away  from  zero.  On  the  other  hand,  the  non-white  noise 
component  may  vanish.* 


Recently,  the  case  has  been  investigated  where  the  white  noise 
component  vanishes • 
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Now,  if  the  non-white  noise  component  ' (t)  does  not  vanish, 
we  will  define 


(71) 


and  write 


S(t)  =  Y  +  £(t) 

i=l 


(72) 


(If  -l^u^^^t)!-  vanishes,  the  sum  in  Eq.  (72)  simply  extends  to  n  -  1 
instead  of  n. )  Incidentally,  the  assumption  that  the  processes 
■|^u^^\t)j-  are  mutually  uncorrelated  is  not  really  essential;  it 
will  be  made  clear  how  the  same  technique  could  be  applied  in  the 
absence  of  this  assumption. 

Now,  it  will  be  assumed  that  the  quantities  to  be  estimated 
are  u^^^t),  t  e(Tj,  T2),  i  *=  1 ,  .  .  .  ,  n.  (Actually  one  is  really 
only  interested  in  i  =  1 ,  n  -  1,  but  the  technique  calls  for 

treating  the  non-white  noise  component  as  if  it  »vere  an  n^^  signal 
component  to  be  estimated.) 

Let 

u^^\t,  t)  =  linear  optimum  estimate  of  (73) 

u^^^(t)  based  on  the  observational  data 
from  to  T,  for  any  t  e  (T^,  T^)- 

The  actual  observed  data  are  given  by  Eq.  (72).  The  virtual 


observations  are  given  by 
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_(i) 

u  (t)  =  0,  i  =  1,  . . . ,  n,  t  e(T^,  T2)  (74) 


and  the  virtual  observation  errors  are  random  processes  having  the 
same  statistics  as 

The  solution  for  u^^\t,  t)  will  be  obtained  by  first  treating 
the  discre te- time  "case  and  later  passing  to  the  limit. 

Thus,  let  it  I  be  a  set  of  equally  spaced  time  points  in 


u 


(T, 


T  ) : 
2'' 


4  t 


(75) 


t  =*  T 
1  1 


Let  t)e  a  set  of  equally  spaced  time  points  in  (t^,  T) 


■^S+l  ■  ”^8 


A  t 


(76) 


The  parameters  to  be  estimated  are  u^^\t  ).  The  actual 

U 

observations  are  given  by  Eq.  (72)  with  t  restricted  to  the  points 
The  virtual  observations  are  given  by  Eq.  (74)  with 
t  restricted  to  the  points  i-s  assumed  that  the  points 

are  a  subset  of 

It  is  assumed  that  the  "zero*'^"  stage  of  the  recursive  procedure 
is  based  only  on  the  virtual  observations,  that  is,  on  the  a-priori 
statistics  of  |u^^^(t)|-.  Thus,  let 
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The  solution  for  the  discrete- time  case  then  consists  of 

applying  Eq.  (62)  -  (65).  This  is  straightforward,  the  main  difficulty 

being  in  keeping  all  the  indices  straight.  Here,  the  parameter 

vector  X  “  (x,  ,  ...,  x  )  has  components  u^^\t  )  arranged  in  some 
1  m  Pi 

order.  Thus,  m  >  >  n  in  general. 

The  result  is  as  follows.  It  should  be  noted  that  the  indices 
i,  j,  k,  I  in  the  following  run  over  (1,  ...,  n) ;  thus,  they  do  not 
have  quite  the  same  meaning  as  in  Eqs.  (62)  -  (65).  In  fact,  the 
indices  i  appearing  in  Eqs.  (62)  -  (65)  correspond  to  pairs  of 
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) 


indices  (i,  p.)  in  the  following.  Also,  the  functions  C  below  arise 
from  the  quantities  labeled  D  in  Eqs .  (62)  -  (65).  It  is  best  to 
regard  the  following  equations  as  self-contained;  Eqs.  (62)  -  (65) 
were  merely  reproduced  to  indicate  the  method  of  derivation,  and  the 
notation  of  Eqs.  (62)  -  (65)  is  not  necessarily  completely  consistent 
with  that  of  the  following. 


A 

U 


s) 


1) 


(80) 


A  t 
R(T^) 


n 


j=l 


n 


s) 


A 

u 


(1) 


0) 


0,  all  i,  p. 


(81) 


C 


ij 


C,.(t  , 

IJ 


s  -  1) 


(82) 


-  d  (t  ,  s)  d  (t  ,  s) 
ip  J  u 


?  n 

"  [  R(tS]  I  '^s’  ® 


k-1 


(83) 


■% 

I  «  -  ')} 

j,k-l 


X 
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C.  .(t  ,  t  ,  0)  =  6.  .  ) 

ij  M-  u  ij  '  11  u 


(84) 


The  case  where  the  random  processes  are  mutually 

correlated  is  obtained  simply  by  changing  Eq.  (84)  to 


C  (t  ,  t  ,  0)  =  t  ) 

M*  U  M-  U 


(84  a) 


where,  denoting  a-priori  expected  value  by  E  (  ), 


.(l.j) 


(t, 


(85) 


We  will  now  go  to  the  limit  by  assuming  that  A  t  -*  0.  Then 
we  get 


5 

d  T 


[S(T)  -  Y  t)] 

j=l 

n 

X  I  •') 

k=l 


(86) 


A 

u 


(i) 


(t, 


0,  all  i  and  all  t  €(Tj^,  T2)- 


n 

aT  ■  R^i  I  ■^>1  (87) 

k,£=l 
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C  (t,  t'.  T^)  =  6  t') 


ij 


ij 


(88) 


for  all  i,  j  and  all  t  and  in  (Tj^,  T2)- 


More  generally,  if  are  mutually  correlated, 


ij 


.(t,  t',  rp  =  t') 


(88  a) 


In  the  above,  u^^^(t,  t)  is  the  estimate  of  u^^^(t), 
t  e(Tj^,  T2),  based  on  the  observational  data  from  to  t.  The 

'if 

indices  i,  j,  k,  X  as  previously  stated  run  from  one  to  n. 

In  the  linear  case  which  has  been  treated  in  this  section,  the 
estimates  u^^^t,  t)  are  precisely  the  minimum  mean  square  error 
estimates  of  u^^^(t),  based  on  actual  observations  up  to  t. 

The  functions  C^j(t,  t^,  t)  have  the  following  interpretation 
(assuming  the  equivalent  of  conditions  and  of  Section  II 
apply): 


C^j(t,  t^  T) 


(89) 


X 


It  should  also  be  mentioned  that,  since  S(t)  is  assumed  to 
contain  a  white  noise  process,  from  the  point  of  view  of  mathematical 
rigor  Eq .  (86)  should  actually  be  written  with  both  sides  integrated 
with  respec t  to  t . 
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The  expectation  in  Eq .  (89)  can  be  interpreted  in  either  of  two 
ways:  the  conventional  way,  regarding  u^^\t)  and  u^'^^(t^)  as  random 


( i )  ) 

variables  and  u  ,  u  as  random  variables  defined  on  the  sample 
space  of  the  "actual"  observations  only;  or  alternately,  regarding 
u^^\t)  and  u^'^^(t^)  as  constants,  and  u^^\  as  random  variabl 

defined  on  the  sample  space  of  the  "actual"  and  the  "virtual" 
observations  . 


es 


It  is  useful  to  have  the  discrete- time  formulas,  Eqs.  (80)  -  (84), 
since,  in  the  first  place,  in  many  applications  the  observed  data 
will  be  at  discrete  times;  and  second,  even  in  the  continuous  time 
case,  the  solutions  to  Eqs.  (86)  -  (88)  would  generally  be  built  up 
from  a  difference  equation  approximation.  Since  the  possible 
difference  equation  approximations  are  non-unique,  Eqs.  (80)  -  (84) 
indicate  the  best  one  (cf.  especially  Eq.  (83),  of  which  the  terra 
in  braces  disappears  when  A  t  -•  0). 

As  an  extension  of  the  foregoing,  suppose  there  are  J  observed 
processes,  as  in  Eq .  (66a). 

Suppose  the  noise  processes  can,  as  before,  be  broken  up  into 
non-white  and  white  components.  The  non-white  components  will  be 
considered  as  J  additional  "signal"  processes,  and  need  not  be 
statistically  uncorrelated . 

In  order  to  preserve  the  feature  of  one-by-one  addition  of 
observations,  it  is  necessary  to  assume  that  the  J  white  noise  com¬ 
ponents  are  mutually  uncorrelated.  If  this  is  not  true  originally, 
one  can  transform  the  problem  so  that  it  is  true  as  follows. 
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At  any  time  t,  assume  there  is  a  non-singular  J  x  J  matrix  M(t) 
which,  applied  to  the  J-vector  of  white  noise  components  at  time  t, 
will  transform  them  into  a  vector  of  uncorrelated  components. 

Then,  simply  consider  the  problem  where  the  observations  consist 
of  the  set  of  processes  [Sj(t)]  resulting  from  applying  M(t)  to 


[S^(t)].  We  would  have 


n-  1 


sj(t)  =  ^  u^^\t)  +  ej(t) 

i=l 


(66b) 


but  the  white  noise  components  would  be  mutually  uncorrelated.  (The 
prime  symbol  does  not  indicate  differentiation.) 

The  most  general  form  of  the  solution  would  be  as  follows.  Let 
u^^\t  1  ...,  T^'^^)be  the  estimate  of  u^^^(t)  based  on  observing 

the  received  processes  [Sj(t)}  from  to  j  «  1 ,  ...»  J.  Also, 

i  now  ranges  from  1  to  n  +  J  -  1.  The  solution  gives  the  partial 
derivatives  of  u^^^  and  with  respect  to  the  variables 


j  =  1. 


As  an  example  of  the  analysis  in  this  section,  suppose  we  wish 
to  apply  this  to  estimate  the  state  of  a  dynamical  system  with  a 
stochastic  driving  function.  Specifically,  suppose 


A  u  «=  T)  (90) 

where  u  and  are  vector  random  processes  and  A  is  a  linear  operator. 

It  is  unnecessary  to  assume  either  that  i]  is  a  process  of  independent 
increments  or  that  A  is  a  differential  operator.  A  and  t\  are  subject 
only  to  the  conditions  that 
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a.  the  a-priori  covariance  function  of  u  be  uniquely  determined 
by  that  of  'j  and  the  form  of  A;  and 

b.  in  the  continuous- time  case,  that  u  be  continuous  in  the 
mean . 

The  procedure  is  to  solve  for  the  covariance  function  of  u  and 
then  apply  the  foregoing  analysis. 

Also,  the  partial  differential  equations  above  can,  in  the  Markov 
case,  easily  be  put  into  the  form  of  total  differential  equations  with 
respect  to  the  latest  observation  time  t,  in  cases  where  the  estimation 
time  coincides  with  t  or  is  related  to  it  by  a  fixed  lead  or  lag. 


III.  3.  Non-Linear  and  Non-additive  Application 

Suppose  the  actual  observation  data  is  given  by 


s(t)  =  £[t.  ....  u<"ht)]  +  e(t),  Tj  s  t  s  T 


where 


{u^^ho} 


are  mutually  uncorrelated  continuous- in- the-mean 


noise 


random  process  over  (Tj^,  ,  with  covariance  functions 

t  and  means  zero;  ®  generalized  white 

process  with  covariance  function  R(t)  6(t  -  t^),  uncorrelated  with 
the  processes  (t)|,  and  with  zero  mean. 

It  is  assumed  that  some  of  the  processes  ■^u^^^(t)j-  may  be 
considered  "signal"  and  others  "noise"  from  the  point  of  view  of  any 
particular  application;  our  technique  calls,  however,  for  all  of 


(91) 
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them  to  be  jointly  estimated.  Also,  some  of  them  may  be  additive 
and  others  not. 

The  recursive  generalized  least  squares  estimates  u^^^(t,  t) 

can  also  be  derived  for  this  case  by  means  of  Eqs.  (62)  -  (65).  To 

make  the  first  order  approximations  valid  for  the  non-linear  case  we 

must  now  assume  that  at  any  time  T  ^  there  exist  estimates  of 

u^^^(t)  which  have  small  errors.  We  will  assume  that  such  estimates 

are,  in  fact,  given  by  u^^^(t,  t).  In  the  discrete-time  case  this 

becomes  u^^^(t  ,  s). 

p. 

In  short,  we  will  assume  in  the  discrete- time  case  that 

u^^^(t  ,  s)  -  u^^^(t  )  is  sufficiently  small  so  that 
P  P 


f[l;, 


(92) 


=  f 


t,  u^^^(t,  s). 


u^"^(t,  s)J 


n 


"  I  H-  [^-  ») 

1»1  ^ 


+  negligible  remainder. 


In  the  continuous- time  case,  simply  replace  u^^^t,  s)  by 


u^^^  (t,  t)  . 
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In  the  non-linear,  non-additive  case,  the  resulting  generalized 

least  squares  estimates  can  no  longer  necessarily  be  stated  to  be 

minimum  mean  square  error  estimates.  However,  they  can  still  be  said 

to  be  asymptotically  minimum  mean  square  error  if  Eq.  (92)  holds. 

The  main  difference  between  this  and  the  linear,  additive  case 

previously  discussed  is  that  the  functions  C^j  now  depend  on  estimates 
(k) 

of  u  (t),  k  =  1,  ...,  n.  The  necessary  modification  of  Eqs.  (80)  -  (84) 
or  Eqs.  (86)  -  (89)  are  as  follows; 

In  the  discrete- time  case,  writing  as  usual  u  =  (u^^\  ...,  u^'^^ 


.  s)  -  s  -  1) 

\x  p, 


(93) 


A  t 


R(t^) 


{"<%>  -  fk-  '  -  1)]} 


n 


I  ^ik^V’  '^s’  B  [^s’ 


k“l 


s  -  1) 


,o) 

1-^ 


=  0,  all  i  and  p 


C.  .  (t  ,  t  ,  s  -  1) 
ij  p,  u  ' 


(94) 


-  d  (t  ,  s)  d  (t  ,  s) 
1  p  j  u 
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=  [ 


A  t 


R(t^)  J 


n 


T  ,  u(T  ,  S  -  1) 
L  S  S 


(95) 


k=l 


1  + 


A  t 
.R(t^) 


n 


j,k=l 


FIT  L%'  “<%•  "  -  >) 


d  f  1  A  . 

j-J-  l_T^,  U(T^,S-1) 


% 

/ 


c,,(t  ,  t  ,  o)  =  6  .  t  ) 

M-  u  ij  M-  U 


(96) 


When  A  t  -»  o,  we  get 


I  T  T)J  =  |s(t)  -  £[t.  a(T,  T)J  I 


(97) 


n 


I  Hr  ["’ 


k=l 


r.(0 


^  (t.  ^  T^) 


(98) 


(99) 


n 


I 


a  f  r 


R(t)  L  Cj^(tHT,T)  ^  ^T,  G(t,t)J  [t,G(T,t) 

k,.e-i  ^ 


c,j(t,  t',  T,)  .  6^J  »<‘>(£,  e') 


(100) 


ef 

I* 


St. 
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As  before,  if  the  assumption  is  dropped  that  are 

mutually  uncorrelated,  then  Eq.  (98)  and  Eq.  (100)  are  replaced  by 
Eqs.  (84  a)  or  (88  a)  respectively. 


IV.  Further  Problems 

Several  areas  of  further  research  are  suggested  by  the  foregoing. 
Two  specific  areas  of  useful  investigation  are,  first,  use  of  the 
recursive  framework  to  treat  problems  calling  for  adaptive  estimation 
methods,  and  second,  investigation  of  techniques  for  obtaining  exact 
(not  merely  asymptotic)  minimum  mean  square  error  estimates  in  the 
non-linear  case. 


Adaptive  Estimation  Methods 

As  used  here,  an  adaptive  estimation  problem  refers  to  one  in 
which  the  a-priori  statistics  of  the  observation  errors,  or  the 
statistics  of  the  signals  if  these  are  regardeu  as  random  processes, 
are  not  known  exactly. 

The  generalized  least  squares  methods  described  above, 
especially  in  their  recursive  form,  may  provide  a  convenient  frame¬ 
work  for  treating  such  problems,  as  has  been  recognized  by  many 
workers  in  this  field. 

For  example,  considering  the  problems  treated  in  Section  III, 
suppose  that  the  covariance  functions  $^^^(t,  t^)  and  the  function 
R(t)  are  not  known  exactly.  How  would  one  modify  the  recursive, 
generalized  least  squares  procedure  to  incorporate  a  feature  whereby 


these  covariance  functions  are  estimated  from  the  data  and  these 
estimates  then  incorporated  in  the  procedure?  One  possible  avenue 
of  approach  is  as  follows. 

Suppose  the  estimation  procedure  is  initiated  simply  by  assuming 
some  set  of  function  t  and  R(t)  to  Insert  into  the  recursive 

equations.  As  previously  discussed,  the  resulting  recursive  er<timates 
are  still  identical  (to  first  order)  to  some  set  of  generalized  least 
squares  estimates.  However,  these  estimates  in  effect  are  obtained  by 
minimizing  a  function  Q  in  which  the  matrix  T]  is  not  the  true  inverse 
covariance  matrix  of  the  observation  errors  (actual  and  virtual). 

However,  this  may  still  result  in  reasonably  good  estimates,  even 

though  they  will  not  be  the  best  possible.  Moreover,  the  machinery 

(2) 

exists  by  which  it  is  possible  to  compute  the  estimation  error 
covariances  as  a  function  of  the  deviation  between  the  true  covariance 
matrices  and  the  assumed  ones. 

Now,  this  procedure  will  result  in  estimates  u^^^(t,  t)  of  the 
random  variables  u^^\t).  Also,  it  can  be  used  to  produce  estimates 
£(t,  t)  of  the  "white  noise"  component,  since  £(t,  x)  « 

S(t)  -  u(t,  x)^.  (In  the  continuous  time  case,  the  estimate 

£(t,  x)  would  have  to  be  Interpreted  as  an  estimate  of  some  suitably 
smoothed  version  of  the  white  noise,  since,  technically,  the  white  noise 
has  infinite  variance  at  a  single  instant.) 

Next,  the  estimates  u^^^(t,  t)  and  g(t,  x)  can  be  used  to  estimate 
the  covariance  functions  t ')  and  R(t).  This  statement  is  clear 

in  case  the  random  process  |u^^^(t)|  snd  |ft(t)l  are  stationary,  in  which 
case  estimates  of  their  covariance  functions  can  be  made  from  a  single 
time  sample. 
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If  they  are  non- stationary,,  it  is  still  possible  to  get  estimates 
of  the  covariance  functions  from  a  single  time  sample,  but  in  general 
these  would  be  very  poor.  However,  if  these  unknown  processes  can 
be  considered  to  be  simple  transformations  of  stationary  processes, 
such  as  integrals  of  stationary  processes,  then  the  covariance  function 
can  be  more  accurately  estimated  from  a  single  time  sample. 

It  still  remains  to  specify  how  one  would  incorporate  the 
resulting  estimates  of  the  covariance  functions  (or  possibly  some 
composite  estimates,  depending  on  both  the  covariance  functions  assumed 
a-priori  and  those  estimated  from  the  data)  into  the  overall  estimation 
procedure . 

Since,  in  the  recursive  procedure,  t.ie  functions  t ')  enter 

into  the  procedure  only  via  the  initial  condition  equations  (84), 

(88),  (96),  or  (100),  one  approach  would  be,  periodically,  to  go  back 

and  solve  for  the  functions  C..(t,  t^,  t)  over  again,  using 

'‘(i)  ^ 

'^(t,  t^,  t)  and  R(t,  t)  in  place  of  the  initially  assumed 

^  (t,  t*),  R(t)  in  these  equations.  Here,  ♦  and  R  refer  to  co- 

variance  function  estimates  making  use  of  data  up  to  time  T.  The 

resulting  recomputed  values  of  Cj^^(t,  T,  t)  would  then  be  used  from 

that  point  on  in  Eqs.  (80),  (86),  (93),  or  (97).  In  part,  this  would 

detract  from  the  recursive  feature,  since  it  would  involve  re-solving 

for  Cj^j(t,  t^,  t).  However,  it  still  preserves  the  recursive  feature 

insofar  as  processing  of  new  observational  data  is  concerned  (at 

least,  this  is  true  in  the  linear  case),  since  it  does  not  require 


any  re- processing  of  the  old  observational  data. 
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Insofar  as  the  function  R(t)  is  concerned,  it  would  appear 
necessary  to  make  some  sort  of  assumption  that  R(t)  is  slowly  varying. 
Some  specific  problems  to  be  investigated  are 

a)  Computation  of  the  degre  .■  to  which  the  accuracy  of  the  estimates 

u^^\t,  t)  is  degraded  if  the  initially  assumed  t^)  and  R(t) 

differ  from  the  true  covariance  functions  of  •^u^^\t)|  and  ^C(t)^. 

(2) 

The  machinery  for  this  already  exists. 

b)  Computation  of  the  degree  to  which  the  initially  assumed  functions 
can  be  improved  on  the  basis  cf  the  observed  data,  whether  and  under 

A 

what  conditions  the  resulting  estimates  i  and  ft  actually  approach  the 
true  covariance  functions,  and  if  they  do,  how  rapidly  as  t  increases. 

c)  Devising  computationally  convenient  ways  of  incorporating  im¬ 
proved  estimates  of  the  covariance  functions  into  the  procedure. 


Exact  Minimum  Mean  Square  Error  Estimates 

Even  if  conditions  and  are  satisfied,  the  generalized 
least  squares  estimation  procedures  do  not  give  the  precise  minimum 
mean  square  error  estimates  for  the  non-linear  case.  However,  it  is 
known,  at  least  formally,  what  the  exact  minimum  mean  square  error 
estimates  are.  They  are  the  estimates  formed  by  finding  the 
expected  values  of  x^  with  respect  to  the  a-posteriori  p  d  f  of 
(Xi.  x^t  ...)  based  on  the  observational  data. 

Now,  in  some  cases  the  mean  of  p(Xj,  X2,  •••  |  S)  occurs  for  the 
same  values  of  x^  as  the  maximum.  In  these  cases,  the  MAP  estimates 
are  the  minimum  mean  square  error  estimates.  This  is  true  in  the 
linear  case.  However,  in  general  it  is  not  the  case. 
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If  one  attempts  to  find  the  conditional  means  in  the  non-linear 
case,  even  for  Gaussian  additive  noise,  one  quickly  gets  into  analyti¬ 
cal  and  computational  problems  of  great  difficulty. 

Incidentally,  the  exact  minimum  variance  unbiased  estimates  can 
be  obtained  from  Barankin's  method, the  application  of  which  to 
stochastic  processes  is  analytically  tractable  up  to  a  point.  However, 
in  general,  the  Barankin  minimum  variance  unbiased  estimates  are  not 
the  minimum  mean  square  error  estimates- 

While  the  development  of  analytically  or  computationally  trac¬ 
table  methods  for  finding  the  conditional  means  has  been  the  subject 
of  a  considerable  llterrture,  further  investigation  of  this  problem 
would  be  very  useful  (and  in  fact  forms  the  subject  matter  of  Ref.  16.) 


Other  Applications 

A  number  of  applications  can  be  thought  of  to  problems  in  which 
there  are  various  mixtures,  all  in  the  same  problem,  of  parameters 
with  which  a-priori  statistics  are  associated  and  those  with  which  no 
a-priori  statistics  are  associated;  random  processes  involving  mixtures 
of  infinite  sets  of  unknown  parameters  and  additional  finite  sets  of 
parameters;  and  mixtures  of  discrete- time  and  continuous- time  obser¬ 
vational  data  or  other  heterogeneous  types  of  observational  data-  The 
foregoing  results  provide  a  systematic  framework  for  treating  large 
classes  of  these  problems,  including  setting  up  recursive  solutions - 

The  problem  treated  in  Appendix  A  with  its  variations  provides 
one  set  of  examples-  Numerous  other  specific  applications  can  readily 


be  thought  of- 
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Appendix  A.  Estimation  of  Rotation  Rate 

This  problem  Is  Introduced  as  an  example  of  a  problem  In  which 
a  smoothed  estimate  of  a  parameter  can  be  obtained,  even  though  there 
are  an  Infinite  number  of  other  unknown  parameters,  and  even  though 
no  a-prlorl  statistics  are  associated  with  the  unknown  parameters. 

Suppose  a  radar  target  is  rotating  about  a  fixed  axis.  The 
returned  signal  amplitude  Is  observed  by  the  radar,  and  the  object  Is 
to  estimate  the  rotation  rate  uj.  The  radar  cross-section  of  the 
target  will  be  assumed  to  vary  with  viewing  aspect.  The  estimate  of 
tu  Is  to  be  made  only  by  observing  the  fluctuations  In  amplitude  of 
the  returned  signal  (and  not  by  means  of  spreading  of  the  doppler 
spectrum,  for  example). 

It  is  assumed  that  nothing  is  known  a-priori  about  the  form  of 
the  radar  cross-section  vs.  aspect  (with  two  minor  exceptions  to  be 
noted  below) . 

The  received  signal  Is  assumed  to  be 

(A  1) 


(A  2) 


S(t)  -  o(uj  t)  +  g(t) 

where  is  a  white  noise  process  with  one-sided  power  spectral 

density  Y. 

If  we  write 

•  ■  U)  t 


It  is  assumed  that 
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(i)  a(6)  is  periodic  with  period  2’^/'i)  but  not  with  any  smaller 

period . 

(11)  0  0^(0)  is  square  integrable  over  (o,  2tt). 

Other  than  this,  no  knowledge  is  assumed  about  a. 

Now,  let  complete  set  of  orthonormal  functions 

in  L^Co,  2n)  •  Then  we  can  write 

JO 

a(0)  =  ^  F^(0)  (A  3) 

i=l 

Thus,  the  received  signal  is 


S(t)  “  f(t,  0),  ,o-t«T 


(A  4) 


where 


f (t,  uj,  a^,  Qf2»  •  •  •) 


I  "i  ‘’i'" 

i-1 


(A  5) 


We  now  wish  to  find  -  u)J  for  the  maximum  likelihood 
estimate  i  of  rotation  rate.  The  approach  will  be  to  apply  the 
formulas  of  Ref.  2  for  the  white  noise  case.  This  approach  leads 
to  the  following. 


(B'S 


oo 


(A  6) 


with  asymptotic  equality, 
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where 


2 

Y 


r  a_L 

Jq  3 


a  f 


d  t 


(A  7) 


In  Eq.  (A  7) , 


1  -  0,  1,  2,  ... 

j  -  0.  1,  2.  ... 


Xq  -  U) 


Now,  for  convenience,  suppose  the  observation  time  T  Is  such 
that  it  Is  an  Integral  number  of  periods: 


^  2n  N 
T  ■  - 

U) 


(A  8) 


(A  9) 


(A  10) 


2  N 
Y  U) 


1,  J  >  o 


(A  11) 


One  can  now  compute  (B  by  truncating  the  matrix  B  to  an 

oo 

n  X  n  matrix;  finding  (B  ^)  for  this  n^^  order  matrix,  and  then 

oo  ' 

letting  n-^®.  The  result  is 
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(B’b 


oo 


[»oo-  I 


-1 


(A  12) 


1=1 


Using  Eq.  (A  9)  -  Eq.  (A  11),  this  results  in 


:[£  -  (»]' 


(A  13) 


Y 


3  .  .2tt  N 


2n  N 


,2.-1 


{  r  ”  [e  o'(9)]  e  -  ^  N  [  f  ®  '*  *]  } 


1=1 


or,  using  Eq.  (A  8),  an  equivalent  form  Is 


u)  "  u> 


(A  U) 


/2tt\^  r  r  v'  r 

\Fj  ij  [•  J  **  ®  ■  I  L  J  ®  o'(N  e)  Pi(N  I)  d  I 


1-1 


.  L 


It  might  be  noted  from  Eq.  (A  13)  or  Eq.  (A  14)  that  E^i  -  n) 
for  N  »  1,  l.e. ,  for  T  less  than  or  equal  to  ore  period.  That  this 
Is  what  should  happen  Is  clear. 
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Now,  suppose  we  compare  Eq.  (A  13)  or  Eq.  (A  14)  with  the 
answer  to  be  obtained  when  0(9)  Is  a-prlori  known  exactly.  It  might 
be  noted  that  In  practical  cases,  even  If  the  nature  of  the  scattering 
object  were  well  known,  a  realistic  assumption  would  have  the 
absolute  amplitude  and  the  Initial  phase  of  rotation  as  unknown 
parameters.  That  Is,  a  realistic  model  would  have 

S(t)  -  a  (j  [u)(t  -  t^)]  +  e(t)  (A  15) 

with  0/,  (ji),  and  t^  unknown  t-prlorl. 

In  the  case  where  a  was  assumed  to  be  completely  unknown, 
treated  above,  It  was  unnecessary  to  assume  additional  unknown  para* 
meters  «  and  t^  since  this  was  automatically  taken  care  of  by 
assuming  that  all  the  In  Eq.  (A  3)  were  unknown. 

(2) 

The  formulas  for  treating  Eq.  (A  15)  are  simple  to  apply. 

However,  f^r  present  purposes,  let  us  make  the  somewhat  unrealistic 

assumption  that  u)  Is  the  only  unknowi  parameter.  (The  answer  will 

then  be  a  lower  bound  for  the  case  where  ry,  ri),  and  t  are  all 

o 

unknown.)  Thus,  Instead  of  Eq.  (A  15),  we  will  assume 

S(t)  -  o(u)  t)  +  E(t)  (A  16) 

where  o  Is  a  known  function. 


Then, 
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Ej^a)  -  -I'l  ?  ~ —  L  '  L®  I  d  Q  (A  17) 

'  o  ,  J 

=  if  \  f  [e,'<H,)ra.r 

o 

Now,  finally,  consider  the  case  where  a(0)  is  considered  to 
be  partly  known.  The  formulation  would  be 

S(t)  =  cr  (u)  t)  +  Qt  Ol  wCt  -  t  )  +  G(t)  (A  18) 

a  ^  DL  o 


where  a  Is  completely  unknown;  a.  is  known;  and  a,  (i»,  t  are 
a  bo 

unknown. 

r 

It  turns  out,  however,  that  If  the  equation  for  E  i  -  q)  Is 

applied  in  this  case,  without  any  further  assumption  of  a-prlori 

knowledge,  the  answer  comes  out  exactly  the  same  as  for  the  cate 

first  treated,  that  is,  the  case  where  the  total  function  a  is 

entirely  unknown.  This  Is  also  true  If  ^  snd  t^  are  assumed  knovm. 

The  reason  is,  of  course,  that  once  n  is  considered  to  be  completely 

sl 

unknown,  that  Is,  that  the  coefficients  In  the  expansion  of  rr 

relative  to  can  be  anything,  this  Is  equivalent  to  saying 

that  the  expansion  coefficients  of  cr  a*,  can  be  anything, 

A  D 

Thus,  It  turns  out  that  the  problem  where  some  portion  of  a  Is 
known  exactly  cannot  be  properly  formulated,  In  such  a  manner  as  to 
reflect  the  benefit  of  such  knowledge,  without  associating  signifi¬ 
cantly  non-r.nlform  a-prlorl  statistics  with  the  urdinown  part  of  a. 
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For  simplicity  of  exposition,  assume  that 

S(t)  *  t)  +  <r^(u)  t)  e(t) 

where 

00 

■  I  “l 

1-1 

00 

ab(6)  -  I  Pb(«) 

1*1 

and  where  all  (3^  are  known  a-prlorl. 

Suppose  that  no  a-priorl  statistics  are  associated  with  uj. 

However,  suppose  a,  .(•)  1«  known  to  be  a  random  process  having  known 

mean  and  covariance  function  over  (0,  2n).  Since  the  mean  of  o  (6) 

Is  assumed  known,  we  can  replace  or  (©)  by  c  (0)  “  o  (•)  and  assume 

a  a  a 

the  mean  to  be  zero. 

Now,  (|)|  has  heretofore  been  considered  a  completely 
arbitrary  complete  orthonormal  set  In  0,  2^.  At  this  point,  we  will 
make  the  following  specific  choice  of  {^^^(0)}'  orthonormal 

eigenfunctions  of  the  covariance  function  f  (o>  90  oi  the  random 

A 

process  over  (0,  2n), 

In  that  case,  the  set  have  a>prlorl  statistics 

-  o,  all  1 

-  aJ  6^j 

2 

where  are  related  to  the  eigenvalues  of  the  kernel  f^(9»  sO. 


(A  19) 


(A  20) 


(A  21) 


(A  22) 
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We  now  have  the  actual  observations  given  by 


S(t)  =  f(t,  u,  a2»  •  ••) 


(A  23) 


f(t»  U3»  ...)  =  ^  (Ofj^  +  3^)  P^(uj  t) 


i-1 


are  all  known  constants. 

The  virtual  observations  are  given  by 


(A  24) 


aj 


(e)  _  .  .  (e) 

= 


(A  25) 


where  the  observed  values  of  are  =  0  and  the  errors  ^ 

have  covariance  matrix  given  by  Eq.  (A  22). 

If  we  now  apply  the  generalized  least  squares  smoothing 
technique  to  the  set  of  observations  consisting  of  both  the  actual 
and  the  virtual  observations,  we  obtain  the  result  that  E  w  -  u)! 

Is  given  by  Eqs. (A  6),,  (A  9),  (A  10),  (A  12),  and  a  modified  version 
of  Eq.  (A  11): 


B 


tj 


(A  11a) 


Also,  In  applying  these  formulas. 


c(#)  “ 


(A  26) 


67 


Caution  must  be  used  In  interpreting  thit.  result,  since  it 

.2 

A 

actually  amounts  to  getting  a  lower  bound  on  E|^j  -  xj  relative  to 
the  fictitious  statistical  ensemble  of  the  actual  and  virtual  obser¬ 
vations.  As  pointed  out  in  Section  II,  the  statement  that  this  can 
be  considered  equivalent  to  E^x  -  xJ  relative  to  the  original  sta¬ 
tistical  ensemble  (defined  by  the  statistics  of  ■|^C(t)|  and  the 
a-pi'iori  statistics  of  j^cr  (0) J  has  been  proved  only  if  certain  first 
order  expansions  are  valid.  The  precise  (.onditions  under  which  the 
result  stated  is  valid,  for  the  case  where  non-uniform  a-priori  sta¬ 
tistics  are  associated  with  a  portion  of  c,  have  not  been  investigated. 
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